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Abstract: Educational data mining is the procedure of converting raw data 
collected from educational databases into some useful information. It can 
be helpful in designing and answering research questions like performance 
prediction of students in academics, factors that affect the students’ 
performance, help the teachers in understanding the problems faced by the 
students to understand the course content and complexity of the subject taken 
so that the teachers can take timely action to control the dropout rate. This 
also includes improving the teaching learning process so that the interventions 
can be taken at the right time to improve the performance of the student. This 
paper is the review of the research work done in the field of educational data 
mining for the prediction of students’ performance. The factors that influence 
the performance of the students i.e. the type of classrooms they attend such 
as traditional or on-line, socio-economic, educational background of the 
family, attitude toward studies and challenges faced by the students during 
course progress. These factors leads to the categorization of the students 
into three groups “Low-Risk”: who have High probability of succeeding, 
“Medium-Risk”: who may succeed in their examination, “High-Risk”: who 
have High probability of failing or drop-out. It elaborates the different ways 
to improve the teaching learning process by providing the students personal 
assistance, notes, class-assignments and special class tests. The most efficient 
techniques that are used in educational data mining are also reviewed such as; 
classification, regression, clustering and and prediction. 


Keywords: Data Mining (DM), Educational data mining (EDM), Education 
System 

INTRODUCTION 

Educational Data Mining (EDM) is the application of Data Mining (DM) 
and its objective is to analyze the different types of data in order to resolve 
educational research issues [16]. Data Mining is the process of extracting useful 
and important information from data sets. It is being used by organizations, 
scientists and governments from last so many years to collect data like airline 
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passenger records and record of census data [2], The volume of educational 
data has increased with advancement of technologies. It can be handled using 
Data Mining techniques. The educational institutes are also getting automated 
with the help of advanced technologies. 

The educational research in Data Mining also contributes a lot to the 
predictive technologies. Data Mining is set up on the theory that the historic 
data retains the hidden and unknown information observed as a challenging 
task in data prediction. Data analysis is one way of forecasting the growth 
or decline in academic performance. The use of internet and e-leaming in 
the field of education has facilitated the students. On the other hand offline 
education- is the medium to exchange knowledge and develop skills by face- 
to-face interaction. The tutor can easily understand the behavior of the student 
towards his studies. The data mining techniques can be applied to such data like 
students’ behavior towards his studies, performance in their academics, family 
background and the data collected form students in classroom interactions. 
Such data help to create student models. E-leaming and Learning Management 
System (LMS) is the combination of online instruction and communication 
that collaborates administration and reporting tools. Intelligent Tutoring (ITS) 
and Adaptive Educational Hypermedia System (AEHS) acquire background 
knowledge about teaching strategy and student behavior are few examples of 
student models. [17]. 

EDUCATIONAL TASK 


EDM plays an important role for every stakeholder. The following table explains 
the importance of EDM that helps in different ways for every stakeholder’s. 


Table I: Various Stakeholders 

Students 

The activities that student involving in that affect their studies. 
Guide them at the right time about their short-falls and 
recommend them the relevant material that can help the students 
at various levels to improve their grades. 

Mentors 

To analyse the students on the basis of their performance and 
provide special attention to the students in case they score lower 
grades. Also the mentors can customize the teaching method 
according to the requirements of students. 

Course planner 

Considering the student performance the course planner should 
plan the course content according to the students’ level of 
learning. Flexibility in the course content should be 
considered according to the changes in technology and learning 
ability of the students. 
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The different data mining techniques that can be used to analyse the data 
available in the educational research issues related to student performance 
in their academics. Different systems have been created in various EDM 
research areas like E-Learning and Learning Management Systems(LMS), 
Adaptive Educational Hypermedia Systems(AEHS) and Intelligent Tutoring 
(ITS)[17]. These resources play an important role in the decision making of 
teachers about students’ performance. So, that the required steps can be taken 
to improve the academic performance of the students. The performance of the 
students can also be assessed using previous data, such as: results drawn from 
class tests, assignments, seminars and assessments taken in class [4, 23]. The 
results collected from the previous data will help teachers as well as students to 
improve their academic record. Based on the marks scored by the students, they 
can be categorized in three groups: “Low-Risk”: Who have High probability of 
succeeding. “Medium-Risk”: Who may succeed in their examination, “High- 
Risk”: Who have High probability of failing or drop-out. Intelligent Tutoring 
system used to Predict/Classify students into different categories and prediction 
about their results after applying the remedial actions [8,9]. The performance of 
the students can be affected by various socio-economic factors such as family 
income of the students and their personal interest in the subject and, also the 
social distractions around students. [10, 52] and perceptions of the students about 
studies [11, 4]. Attendance also plays an important role in the performance of 
student as it predicts their attitude towards studies. The hours spent on self-study 
after the college is also an important factor to consider, educational background 
of the parents and their awareness about opportunities provided by the institutes 
nearby can play a vital role in the performance of student [19, 24, 25, 28]. 

The type of classroom attended by the student, affects the performance of 
the student i.e. traditional classroom or on-line education affects the learning 
ability of the student [19] [24], Different strategies can be used to judge the 
students’reaction to a particular approach used to assess their performance. Data 
mining techniques used to detect the students attitude towards the assignments 
given and also the behavior towards their studies. Based on the different 
behaviours and personal habits of the students different groups of students 
can be formed such as: Hint-Driven (the students who can solve the problem 
with little assistance) or Failure-Driven (the students who cannot score marks 
even after taking help from the tutor or the person providing the assistance) 
and Low-Motivated students (the students who feel they cannot perform in 
the studies after every successful attempt). The required remedial actions can 
be taken that can lower down their drop-out rates [9] and also boost them to 
perform better. Two different data sources (logged interface and eye-tracking 
data) are used and discussed in exploratory learning environments in building 
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students models using data-based modelling framework in unsupervised and 
supervised learning environments [12]. 

To improve the performance of the students’ various systems implemented 
that provide immediate feedback . ASSISTments have been considered to be 
one of the best systems. It uses the concept of scaffolding questions that break 
the problem into pieces, so that the student can solve the complex problems 
[10] hints are messages that provide insights and suggestions for solving a 
specific problem, and each hint sequence ends with a bottom-out hint which 
gives the required answer to the student. [1, 3, 8, 45, 51, 53]. To predict the 
student performance, these methods provide good results as the existing 
literature refers [28,30,35,36]. It is an online learning environment adopted 
by the open universities clubbing the students’ data with their demographic 
data, by using VLE each student can be tracked individually and more focused 
support can be provided to the students. It highlights the students’ pattern of 
behaviour that changes during the course as the complexity increases with 
the progress of the course. During the progression of the course five to seven 
assignments are provided to the students. Generally, the module ends by a final 
examination. The first assessment is considered to be a good predictor for the 
further performance of students. This predicts the right time of intervention 
with student so that the teacher can find out which students is at-risk and 
also offer them help at the right time. This can improve students’ chances of 
success in academics. Online education platforms like Massive Open Online 
Courses (MOOCs), Coursera or Future-Leam are used to identify the students 
at-risk, they used predictive models using machine learning techniques. These 
models provide the information to the teachers about the performance of 
students. The four identified activity types that provide useful information 
for prediction, Resource: books and handouts for the students, Forum: the 
platform provided to the students to communicate with the tutors, Subpage: 
the means of navigating in the VLE environment, Content: specification and 
the guidelines. [39]. 

TOOLS AND TECHNIQUES 

To predict the students’ performance and behaviour towards their studies, 
data mining provides various tools and technologies to extract the valuable 
information from the available data. The various models created using 
supervised and unsupervised learning are: Lean model, Item Response Theory 
Style, Rasch model [10, 48] GUHA and Markov Chain-based graphical 
models [35] Bayesian models [36].They are few examples of the models that 
help to predict the performance of students and also provide them the required 
guidance at the right time. Most popularly the classification techniques are 
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applied to the educational data to process relevant information [9, 12, 23, 
25, 26, 30, 36, 39]. Other methods that can be applied on the data to create 
models using data mining techniques such as: Association rules, Clustering, 
Sequential pattern analysis. Association rules, Clustering, Classification, 
Sequential pattern analysis, decision trees, Bayesian network [1, 4, 10, 11, 12, 
17]. Di . Discriminant analysis and neural networks are applied to obtain the 
efficiency of students [1, 2]. To conduct the study on factors that affects the 
performance of the students methods like classification trees and CHAID can 
be applied [19, 24], 

The data mining tools came into existence as the need of the researchers 
to access the data also increased. The data mining tools play an important role 
in the filtering of data, as these tools provide various methods for processing 
of related data. With the advancement of technology, the volume of data 
is also increasing day by day in the field of education, business, algorithm 
development and applied research. The theoretical data needs to be converted 
into computerized data as the volume of data is also increasing. To process 
this digital data and to extract the useful information various commercial and 
open software are available in the market. WEKA tool is very popular amongst 
the researchers as compared to other tools as it is an open source tool and is 
JAVA based. The algorithm can be implemented using WEKA and the desired 
results can be obtained [6, 15, 20, 24, 49]process evaluation and optimization 
[6, 10]. RapidMiner is a powerful software platform that gives an integrated 
environment for machine learning, data mining, text mining and other business 
and prediction analysis. It gives good results with small data set. It classifies 
the testing data and used 10-fold validation method. [43, 46, 47], 

The preferred techniques to apply on the data are: Neural network for 
Moodle logs, Pattern Mining for evaluating statistically relevant patterns for 
data where values need to be delivered, to build the trees for prediction C4.5 
algorithms with ID3 proved better in terms efficiency. [20]. Sequential pattern 
mining is one of the unsupervised techniques of interest in acquiring the 
experience of student to use intelligent system and context free languages more 
expressive than regular [11]. Classification methods i.e. Rule Induction and 
Naive Bayesian classifier used for the prediction of graduate students’ grade. 
The students were clustered into groups using K-Means clustering algorithm 
[26,42], The concept of Scaffolding questions (breaks down problem into steps 
to eventually get the problem correct) to get the better results from students. 
Different models were implemented using these test results such as: Lean 
model, Item Response Theory Style, Rasch model [10]. The previous student 
database used to predict the division of student using classification techniques, 
from the number of approaches available for data classification, the decision 
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tree method preferred [41,44]. For the prediction of student performance at 
the end of semester the information like: assignment marks, attendance, and 
seminar and class test marks were collected [42]. This information can help 
both the teachers and students to raise their level of division. The students who 
need special attention were identified so that the ratio of failure in student can 
be controlled and the required action could be taken [23]. 

On the other hand, methods like decision-tree classification, support vector 
machine (SVM), general unary hypotheses automaton (GUF1A), Bayesian 
networks, and linear and logistic regression have their own importance as 
compared to other methods. These methods used to build predictive models 
using data from several Open University (OU) modules. The Open University 
offers a good test-bed for this work. The authors have discussed how the 
predictive capacity of the different sources of data changes as the course 
progresses. It also highlights the importance of understanding how a students’ 
pattern of behaviour changes during the course [29]. Methods like GUFlAand 
Markov chain-based graphical models explored to provide useful insights into 
the students’ behaviour during their studies. On the basis of these outcomes the 
teachers and supporting staff can plan the interventions and required steps to 
help them in proving their acadamic perfromance [35]. 

The latest work at the Open University which used data from VLE, clubbed 
with demographic data for the prediction of student failure or dropout. The 
methods and techniques like, machine learning techniques, Bayesian models, 
distance learning used to get the better results. The prediction combines the 
results of machine learning algorithms: 1. k Nearest Neighbours (k-NN): 
Demographic data. 2. Classification and Regression Tree (CART: VLE data. 
3. Bayes network combines both demographic and VLE data used to find out 
the students at-risk their behaviour towards their study and performance in 
presenting their skills. CART and Bayes models are applied to the combined 
VLE and demographic data [36]. 

LIMITATIONS AND FUTURE SCOPE 

To improve the teaching learning process and to get the better results from 
students there is a need of personal interaction between student and teacher. 
It helps teacher to predict the students’ performance and to better understand 
the problems faced by the students. The students can be assisted in a better 
way and their risk of failure can be minimised by providing them with the 
required interventions. This paper is the review of previous work done in the 
field of education using various data mining tools and techniques. By using 
these tools and techniques the factors derived by the researchers that affect 
the performance of the students are: class attendance, previous class grades 
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and socio-economic factors that influence the studies of the students. Based 
on these factors students were categorized: High-risk students, Medium-risk 
student and Low-risk students, Good students, Bad students, slow learners, 
fast learners etc. The classification techniques preferred by the researchers 
for prediction include: Neural Networks, Decision Tree, NaiveBayes models, 
k nearest-neighbour etc. Decision tree has much efficiency and accuracy 
as concluded by number of researchers. Most of the research work is done 
in the field of educational data mining is based on the online learning, the 
techniques and methods applied also focused on the online education. The 
traditional classroom has taken back seat in the research. There is a need to pay 
attention towards the traditional learning also. Methods that can be applied on 
traditional learning environment followed by majority of the Indian institutes 
are not explored properly. 

CONCLUSION 

The experiments have been carried out in open universities in countries other 
than India. The research work done in this field is especially related to subjects 
like psychology, mathematics, history and home science. Very few focused 
on the technical courses like : computers. Majority of the studies conducted 
are predicting the performance of the students based on the demographic, 
academic and social factors alien to the Indian environment. Since Indian 
culture and living style is different, it demands a different study to exactly 
relate to its educational system. There is a need to further explore the Indian 
education system so that the factors that affect the students performance can 
be studied according to the Indian scenario. 
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